Skip to content

blasthttp integration#2992

Open
liquidsec wants to merge 62 commits intodevfrom
blasthttp-integration-clean
Open

blasthttp integration#2992
liquidsec wants to merge 62 commits intodevfrom
blasthttp-integration-clean

Conversation

@liquidsec
Copy link
Copy Markdown
Contributor

@liquidsec liquidsec commented Mar 27, 2026

Summary

Replaces BBOT's entire HTTP infrastructure with blasthttp, a Rust-based HTTP library with Python bindings. This eliminates the httpx Go binary subprocess, the curl subprocess helper, and the HTTPEngine ZMQ subprocess — all HTTP now runs in-process through a shared blasthttp client.

What changed

New HTTP engine:

  • All HTTP requests go through helpers.request()WebHelper → shared blasthttp.BlastHTTP() client
  • Rate limiting via web.http_rate_limit config, enforced at the client level across all callers
  • resolve_ip parameter for DNS pinning (like curl --resolve) — connects to a specific IP while preserving hostname for Host header and TLS SNI
  • request_target parameter for request-line override (SSRF/smuggling testing)
  • TLS certificate info (CN, SANs, issuer) available on every HTTPS response via response.cert_info

Shared event loop (blasthttp 0.2.0):

  • blasthttp upgraded to 0.2.0 which uses pyo3-async-runtimes to return native Python coroutines via future_into_py()
  • All run_in_executor_io() wrappers around blasthttp calls replaced with direct await — HTTP requests no longer consume OS threads
  • IO thread pool shrunk (only remaining caller is wafw00f)
  • Thread pool backlog status line removed (no longer relevant)

Removed:

  • httpx Go binary module and its test (bbot/modules/httpx.py)
  • ffuf Go binary module and its test (bbot/modules/ffuf.py, ffuf_shortnames.py)
  • HTTPEngine ZMQ subprocess (bbot/core/helpers/web/engine.py)
  • AsyncClient / httpx Python library (bbot/core/helpers/web/client.py)
  • helpers.web.curl() subprocess helper
  • DEP_FFUF and DEP_CURL shared dependency definitions
  • httpx Python library from dependencies

Added:

  • bbot/modules/http.py — native HTTP module using blasthttp batch API (replaces httpx Go binary)
  • bbot/modules/web_brute.py — native web fuzzer using blasthttp batch API (replaces ffuf)
  • bbot/modules/web_brute_shortnames.py — IIS shortname resolver using ML prediction
  • bbot/modules/generic_ssrf.py — SSRF detection module
  • bbot/modules/output/webhook.py — renamed from output/http.py to avoid collision with scan module
  • bbot/core/helpers/web/blast_response.py — response wrapper for blasthttp PyO3 objects
  • bbot/test/mock_blasthttp.py — mock infrastructure for test HTTP interception
  • Rate limit test (test_web_rate_limit.py)
  • Download timeout default (5 minutes) for large wordlist files

Updated:

  • sslcert module rewritten to use blasthttp cert_info instead of independent pyOpenSSL connections
  • host_header and generic_ssrf modules converted from curl() to request() with resolve_ip/request_target
  • elastic output module fixed to import from webhook instead of deleted http
  • All module tests updated for blasthttp mock API (blasthttp_mock.add_response())
  • Presets updated: dirbust-light, dirbust-heavy, dotnet-audit reference web_brute instead of ffuf
  • Test mock conftest passes through resolve_ip=127.0.0.1 requests to real blasthttp

Bug fixes

  • Blacklist.get() NoneType crash_make_event_seed() could return None when host validation fails, causing AttributeError on .host access. Events hitting this path were silently dropped from the scan pipeline. Added None guard with defensive tests.
  • DNS CNAME escaped quotesclean_dns_record() didn't strip quote characters that dnspython's to_text() can produce on certain record types, causing ValidationError and silently skipping DNS children. Added .strip("'\"") before rstrip(".").
  • Certspotter rate-limit crash — API returns a JSON dict (not list) when rate-limited, causing AttributeError: 'str' object has no attribute 'get' when iterating dict keys. Added isinstance(json, list) guard.
  • Excavate YARA blocking event loopyara_rules.match() was called synchronously on the event loop, serializing all 8 excavate workers despite _module_threads = 8. Offloaded to run_in_executor_cpu() for real parallelism (YARA releases the GIL). ~2.5-3x throughput improvement.
  • Stale ip-* and http-title-* tags — The new http module was creating ip-{hostname} and http-title-{title} tags instead of using _resolved_hosts and http_title like the old httpx module did. Fixed to match post-naming-standardization conventions from PR Preset naming standardization / tag cleanup #2986.
  • _drain_queues() infinite loop on Ctrl+C — When a module's queue was None/False, the drain loop never raised QueueEmpty and spun forever, causing the scan to hang after "Aborting scan" until a second Ctrl+C. Moved the None/False check outside the loop.
  • web_brute sequential bottleneck_module_threads defaulted to 1, meaning queued URLs were fuzzed one at a time. Bumped to 4. Added configurable concurrency option. Wired the previously unused rate option through to blasthttp request_batch.
  • blasthttp rate limit override — When both a global rate limit and a per-call rate limit were set, the global always won unconditionally. Fixed in blasthttp 0.1.4 (blacklanternsecurity/blasthttp#12) to use min(global, per_call) so modules can enforce tighter limits.

Dependency changes

  • Added: blasthttp>=0.2.0
  • Removed: httpx>=0.28.1

Merge from dev + baddns 2.2.0

Merged dev into this branch and resolved the baddns conflicts by bumping to baddns 2.2.0, which ships the blasthttp-based rewrite:

  • Upstream baddns now takes an http_client instance (renamed from http_client_class) and defaults to a shared BlastHTTP() when none is provided
  • Baddns internals (HttpManager, mtasts, references, matcher, etc.) fully converted off httpx: response.status/response.body attribute shapes, list-of-tuple headers normalized via a new headers_to_dict() helper
  • bbot/modules/baddns.py and baddns_direct.py now pass http_client=self.helpers.blasthttp and dns_client=self.scan.helpers.dns.blastdns — baddns submodules reuse the same rate-limited shared client as the rest of bbot
  • Dropped baddns~=2.1.0baddns~=2.2.0 in module deps_pip and pyproject.toml

Additional bug fix

  • portscan ping_first mode losing hostname correlationemit_open_port() creates a fresh DNS_NAME event for each alive host, which is immediately fed into a second make_targets() call for the SYN scan. Because dnsresolve runs asynchronously after emit_event, that new event's resolved_hosts was empty when the SYN correlator read it, so the hostname silently dropped out of the correlator. Ports found on shared IPs then propagated only to the IP-range parent — OPEN_TCP_PORT events for the hostnames were intermittently missing. Fixed by seeding event._resolved_hosts from the parent DNS_NAME at creation time.

Post-merge cleanup

  • Restore _task_counter on precheck/postcheck without re-introducing event-loop saturation — The earlier saturation fix had stripped _task_counter.count(...) wrappers from precheck/postcheck on the per-event hot path, but those wrappers are load-bearing for finished-detection (is_finished reads _task_counter.value > 0). Without them a module can be marked finished while events are still mid-check. Removed the asyncio.Lock from TaskCounter instead — the lock had no real synchronization role on the single-threaded event loop, just two forced event-loop yields per count() call — then re-added the wrappers in _events_waiting and BaseInterceptModule._worker. Hot-path cost is now a uuid + dict insert/pop with no event-loop yield. (bbot/core/helpers/async_helpers.py, bbot/modules/base.py)

  • Remove vestigial _ensure_minimal_target shim — Originally added so WebHelper.__init__ (which reads preset.target at construction) could be safely accessed before Preset.bake() ran. That pre-bake helpers path lives on the asn-as-targets / scope-rework branch where Scanner.helpers falls back to self._unbaked_preset.helpers when self.preset is None. On this branch Scanner.__init__ synchronously bakes the preset, so by the time anyone reads .helpers the target already exists. Verified the shim never fires by adding a diagnostic raise inside it and running step_1 + sample module tests + bbot --help/--list-modules.

  • Remove BBOTTarget pickle supportBaseTarget.__getstate__/__setstate__ and ScanBlacklist.__setstate__ existed solely so BBOTTarget could be passed to the HTTPEngine ZMQ subprocess via server_kwargs. That subprocess is gone — blasthttp runs in-process — so the workaround for RadixTarget (PyO3) being un-picklable is no longer needed. Also dropped the test_target_pickle round-trip test.

  • Restore compact form of test_manager_scope_accuracy.py and harden the ruff exemption — The file is excluded from ruff format via pyproject.toml's format.exclude, but explicit-file invocations (pre-commit hooks, ruff format <file>) bypass the exclude unless force-exclude = true is also set. The file had been re-expanded from ~900 to 3209 lines as a side-effect. Re-compacted via ruff format --line-length=250 --config 'format.skip-magic-trailing-comma=true' (down to 871 lines) and added force-exclude = true so the existing exclude is honored on explicit invocations too.

- Add blasthttp (>=0.1.3) as HTTP engine, remove httpx subprocess dependency
- Remove HTTPEngine subprocess, all HTTP now in-process via shared blasthttp client
- Remove curl helper, use request() with resolve_ip and request_target
- Remove obsolete ffuf module (replaced by web_brute)
- Remove obsolete httpx module (replaced by http)
- Add native http module using blasthttp batch API
- Add native web_brute module using blasthttp batch API
- Add web_brute_shortnames module
- Add generic_ssrf module
- Rewrite sslcert to use blasthttp cert_info
- Add blasthttp mock infrastructure for tests
- Add resolve_ip passthrough in test conftest for localhost
- Add rate limit tests
- Add 5-minute default timeout for downloads
- Rename output http module to webhook
- Fix elastic output module import
- Update all module tests for blasthttp mock API
- Remove DEP_FFUF and DEP_CURL from shared_deps.py
- Remove ffuf version config from defaults.yml
- Update presets (dirbust-light, dirbust-heavy, dotnet-audit) to use
  web_brute/web_brute_shortnames instead of ffuf/ffuf_shortnames
- Update test_scan.py module stat tests for renamed modules
- Remove httpx Python library from dependencies
- Update telerik comment
@liquidsec liquidsec changed the title Replace httpx/curl with blasthttp HTTP engine Replace httpx (python)/httpx (project discovery)/curl/ffuf, etc with blasthttp Mar 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 27, 2026

📊 Performance Benchmark Report

Comparing dev (baseline) vs blasthttp-integration-clean (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 3.92ms 3.93ms +0.2%
Bloom Filter Large Scale Dns Brute Force 17.62ms 17.72ms +0.6%
Large Closest Match Lookup 332.62ms 328.90ms -1.1%
Realistic Closest Match Workload 179.49ms 176.65ms -1.6%
Event Memory Medium Scan 1784 B/event 1784 B/event +0.0%
Event Memory Large Scan 1768 B/event 1768 B/event +0.0%
Event Validation Full Scan Startup Small Batch 371.78ms 370.75ms -0.3%
Event Validation Full Scan Startup Large Batch 533.32ms 523.63ms -1.8%
Make Event Autodetection Small 26.02ms 25.51ms -2.0%
Make Event Autodetection Large 265.75ms 261.84ms -1.5%
Make Event Explicit Types 11.48ms 11.35ms -1.1%
Excavate Single Thread Small 3.500s 3.381s -3.4%
Excavate Single Thread Large 9.282s 9.189s -1.0%
Excavate Parallel Tasks Small 3.649s 3.562s -2.4%
Excavate Parallel Tasks Large 6.970s 6.185s -11.3% 🟢🟢 🚀
Is Ip Performance 2.90ms 2.91ms +0.3%
Make Ip Type Performance 10.69ms 10.58ms -1.0%
Mixed Ip Operations 4.19ms 4.14ms -1.4%
Memory Use Web Crawl 41.8 MB 148.9 MB +256.7% 🔴🔴🔴 ⚠️
Memory Use Subdomain Enum 19.4 MB 19.4 MB +0.1%
Scan Throughput 100 7.419s 3.504s -52.8% 🟢🟢🟢 🚀
Scan Throughput 1000 33.985s 26.929s -20.8% 🟢🟢🟢 🚀
Typical Queue Shuffle 55.64µs 58.03µs +4.3%
Priority Queue Shuffle 601.22µs 629.51µs +4.7%

🎯 Performance Summary

+ 3 improvements 🚀
! 1 regression ⚠️
  20 unchanged ✅

🔍 Significant Changes (>10%)

  • Excavate Parallel Tasks Large: 11.3% 🚀 faster
  • Memory Use Web Crawl: 256.7% 🐌 more memory
  • Scan Throughput 100: 52.8% 🚀 faster
  • Scan Throughput 1000: 20.8% 🚀 faster

🐍 Python Version 3.11.15

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 90.52854% with 224 lines in your changes missing coverage. Please review.
✅ Project coverage is 91%. Comparing base (5be4993) to head (554feb4).

Files with missing lines Patch % Lines
bbot/core/helpers/web/web.py 71% 62 Missing ⚠️
bbot/modules/web_brute.py 78% 55 Missing ⚠️
bbot/test/mock_blasthttp.py 91% 22 Missing ⚠️
bbot/core/helpers/web/blast_response.py 84% 15 Missing ⚠️
bbot/modules/generic_ssrf.py 92% 13 Missing ⚠️
bbot/modules/http.py 94% 11 Missing ⚠️
bbot/scanner/scanner.py 34% 10 Missing ⚠️
bbot/modules/sslcert.py 80% 8 Missing ⚠️
bbot/test/test_step_1/test_python_api.py 67% 5 Missing ⚠️
bbot/core/helpers/command.py 67% 4 Missing ⚠️
... and 11 more
Additional details and impacted files
@@          Coverage Diff           @@
##             dev   #2992    +/-   ##
======================================
- Coverage     91%     91%    -0%     
======================================
  Files        437     440     +3     
  Lines      37509   38098   +589     
======================================
+ Hits       33925   34361   +436     
- Misses      3584    3737   +153     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This was referenced Mar 27, 2026
liquidsec added 18 commits April 3, 2026 17:08
…tion

Reduce event loop saturation for scan throughput
Make BlasthttpResponse.__str__ return repr instead of response body,
and fix diff.py baseline error message to format responses safely.
Replace httpx with http in TestIIS_Shortnames_GatewayError modules_overrides.
@liquidsec liquidsec changed the base branch from 3.0 to dev April 16, 2026 18:10
Resolved baddns module conflicts by switching to baddns 2.2.0's new
http_client= injection (instance, not class): modules pass
self.helpers.blasthttp as http_client and self.scan.helpers.dns.blastdns
as dns_client. Bumped baddns dep to ~=2.2.0 in pyproject and both
module files.
The ping event created in emit_open_port was being fed straight into
the SYN-scan make_targets() call, but dnsresolve hadn't yet populated
its resolved_hosts (emit_event is async). Hostnames silently dropped
out of the SYN correlator, so ports found on shared IPs only
propagated to IP-range parents — OPEN_TCP_PORT events for hostnames
were intermittently missing.
@TheTechromancer
Copy link
Copy Markdown
Collaborator

Benchmark:

Library Time QPS Success Failed vs httpx
go-stdlib 0.140s 143,013 20,000 0 158.31x
blasthttp-python 0.151s 132,217 20,000 0 146.35x
blasthttp-cli 0.260s 76,944 20,000 0 85.17x
c-libcurl 0.276s 72,578 20,000 0 80.34x
blasthttp-python-200k 20.361s 982 20,000 0 1.09x
blasthttp-cli-200k 20.617s 970 20,000 0 1.07x
httpx 22.139s 903 20,000 0 1.00x

Waiting on rate limiter improvement:

…clean

# Conflicts:
#	bbot/modules/paramminer_cookies.py
#	pyproject.toml
…ent-loop saturation

Drop the asyncio.Lock from TaskCounter (single-threaded event loop, no awaits inside the
protected region — the lock was pure overhead). With count() now yield-free, re-add the
wrappers around precheck/postcheck in _events_waiting and BaseInterceptModule._worker
so modules can't be marked finished while events are still mid-check.
The shim was needed during the asn-as-targets / scope-rework work where
Scanner.helpers fell back to self._unbaked_preset.helpers when self.preset
was None. On this branch Scanner.__init__ synchronously bakes the preset,
so by the time anyone reads .helpers the target already exists. Verified
by sticking a raise inside _ensure_minimal_target and running step_1 +
sample module tests + bbot --help/--list-modules — never fired.
The HTTP engine subprocess that consumed pickled targets is gone — blasthttp
runs in-process. Drop BaseTarget.__getstate__/__setstate__, ScanBlacklist.__setstate__,
and the test_target_pickle round-trip test.
…f exemption

The file was supposed to be excluded from ruff format (format.exclude lives in
pyproject.toml), but explicit-file invocations (pre-commit etc.) bypass
format.exclude unless force-exclude is also set. The file got re-expanded
from ~900 to 3200 lines as a result.

- Re-compact via ruff format with line-length=250 and skip-magic-trailing-comma
- Set force-exclude=true so the existing format.exclude is honored even when
  the file is passed explicitly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants